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Separating decision tree complexity 
from subcube partition complexity 


Robin Kothari* David Racicot-Desloges^ Miklos Santha-t 


Abstract 

The subcube partition model of computation is at least as powerful as decision trees but 
no separation between these models was known. We show that there exists a function whose 
deterministic subcube partition complexity is asymptotically smaller than its randomized deci¬ 
sion tree complexity, resolving an open problem of Friedgut, Kahn, and Wigderson (2002). Our 
lower bound is based on the information-theoretic techniques first introduced to lower bound 
the randomized decision tree complexity of the recursive majority function. 

We also show that the public-coin partition bound, the best known lower bound method for 
randomized decision tree complexity subsuming other general techniques such as block sensitiv¬ 
ity, approximate degree, randomized certificate complexity, and the classical adversary bound, 
also lower bounds randomized subcube partition complexity. This shows that all these lower 
bound techniques cannot prove optimal lower bounds for randomized decision tree complexity, 
which answers an open question of Jain and Klauck (2010) and Jain, Lee, and Vishnoi (2014). 


1 Introduction 

The decision tree is a widely studied model of computation. While we have made significant progress 
in understanding this model (e.g., see the survey by Buhrman and de Wolf [BdW02]), questions 
from over 40 years ago still remain unsolved [Ros73]. 

In the decision tree model, we wish to compute a function / : {0,1}" —>- {0,1} on an input 
x € {0, l} n , but we only have access to the input via a black box. The black box can be queried 
with an index i G [n], where [n] = {1,2,..., n}, and will respond with the value of Xi, the zth bit 
of x. The goal is to compute f(x), while minimizing the number of queries made to the black box. 

For a function / : {0, 1}™ —> {0,1}, let D(f) denote the deterministic query complexity (or 
decision tree complexity) of computing /, the minimum number of queries made by a deterministic 
algorithm that computes / correctly on all inputs. Let i?o(/) denote the zero-error randomized 
query complexity of computing /, the minimum expected cost of a zero-error randomized algorithm 
that computes / correctly on all inputs. Finally, let R(f) denote the bounded-error randomized 
query complexity of computing /, the number of queries made in the worst case by a randomized 
algorithm that outputs f(x) on input x with probability at least 2/3. More precise definitions can 
be found in Section 2. 
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Several lower bound techniques have been developed for query complexity over the years, most of 
which are based on the following observation: A decision tree that computes / and makes d queries 
partitions the set of all inputs, the hypercube {0, l} n , into a set of monochromatic subcubes where 
each subcube has at most d fixed variables. A subcube is a restriction of the hypercube in which 
the values of some subset of the variables have been fixed. For example, the set of n-bit strings 
in which the first variable is set to 0 is a subcube of {0,1}” with one fixed variable. A subcube 
is monochromatic if / takes the same value on all inputs in the subcube. This idea is also the 
basis of many lower bound techniques in communication complexity [KN06], where a valid protocol 
partitions the space of inputs into monochromatic rectangles. 

However, not all subcube partitions arise from decision trees, which naturally leads to a po¬ 
tentially more powerful model of computation. This model is called the subcube partition model 
in [FKW02], but has been studied before under different names (see e.g., [BOH90]). The determin¬ 
istic subcube partition complexity of /, denoted by D sc (f), is the minimum d such that there is a 
partition of the hypercube into a set of monochromatic subcubes in which each subcube has at most 
d fixed variables. Since a decision tree making d queries always gives rise to such a partition, we have 
D sc (f) < D(f). Similarly, we define zero-error and bounded-error versions of subcube partition 
complexity, denoted by Rff(f) and R sc (f), respectively, and obtain the inequalities R(f(f) < Ro(f) 
and R sc (f ) < R(f). As expected, we also have Rffif) < D sc (f ) and R sc (f ) < D sc (f). 

This brings up the obvious question of whether these models are equivalent. Separating them 
is difficult, precisely because most lower bound techniques for query complexity also lower bound 
subcube partition complexity. The analogous question in communication complexity is also a long¬ 
standing open problem (see [KN06, Open Problem 2.10] or [Jukl2, Chapter 3.2]). In fact, Friedgut, 
Kahn, and Wigderson [FKW02, Question 1.1] explicitly ask if these measures are asymptotically 
different in the randomized model with zero error: 

Question 1. Is there a function (family) f = ( f n ) such that R(f(f) = o(Ro(f))? 

Similarly, one can ask the same question for bounded-error randomized query complexity. The 
main result of this paper resolves these questions: 

Theorem 1. There exists a function f = (fh), with fh : {0, l} 4h — > {0,1}, such that D sc (f) < 3 h , 
but D(f) = 4\ Ro(f) > 3.2\ and R(f) = 12(3.2 h ). 

This shows that query complexity and subcube partition complexity are asymptotically different 
in the deterministic, zero-error, and bounded-error settings. Besides resolving this question, our 
result has another application. We know several techniques to lower bound bounded-error random¬ 
ized query complexity, such as approximate polynomial degree [NS95], block sensitivity [Nis91], 
randomized certificate complexity [Aar06] and the classical adversary bound [LM08, SS06, Aar08]. 
All these techniques are subsumed by the partition bound of Jain and Klauck [JK10], which in 
turn is subsumed by the public-coin partition bound of Jain, Lee, and Vishnoi [JLV14], Addition¬ 
ally, this new lower bound is within a quadratic factor of randomized query complexity. In other 
words, if PPRT(/) denotes the bounded-error public-coin partition bound for a function /, we 
have PPRT(/) < R(f) and also R(f) = 0(PPRT(/) 2 ). This leaves open the intriguing possibil¬ 
ity that this technique is optimal and is asymptotically equal to bounded-error randomized query 
complexity. Jain, Lee, and Vishnoi [JLV14] indeed ask the following question: 

Question 2. Is there a function (family) f = ( f n ) such that PPRT(/) = o(R(f))? 

Our result also answers this question, because, as we show in Section 2, PPRT(/) < R sc (f). 
Thus, our asymptotic separation between R sc (f) and R(f) also separates PPRT(/) from R(f). 
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We now provide a high-level overview of the techniques used in this paper. The main result 
is based on establishing the various complexities of a certain function. The function we choose is 
based on the quarternary majority function 4-MAJ : {0, l} 4 —>• {0,1}, defined as the majority of 
the four input bits, with ties broken by the first bit. This function has low deterministic subcube 
complexity, D SC (4-MAJ) < 3, but has deterministic query complexity D(4-MAJ) = 4. From this 
function, we define an iterated function 4-MAJ/,. on A h variables by composing the function with 
itself h times, which gives us a function on 4 h bits. Since deterministic query complexity and 
deterministic subcube complexity behave nicely under composition, we have U(4-MAJ/,) = A h and 
U SC (4-MAJ/,) < 3 h . These results are further discussed in Section 3. To prove Theorem 1, it 
remains to show that the randomized query complexity of this function is Q(3.2 h ). 

We lower bound the randomized query complexity of 4-MAJ/, using a strategy similar to the 
information-theoretic technique of Jayram, Kumar, and Sivakumar [JKS03] and its simplification 
by Landau, Nachmias, Peres, and Vanniasegaram [LNPV06]. However, the original strategy was 
applied to lower bound a symmetric function (iterated 3-MAJ), whereas our function is not sym¬ 
metric since the first variable of 4-MAJ is different from the rest. We modify the technique to apply 
it to asymmetric functions and establish the claimed lower bound. The lower bound relies on choos¬ 
ing a “hard distribution” of inputs and establishing a recurrence relation between the complexities 
of the function and its subfunctions on this distribution. Unlike 3-MAJ, where there is a natural 
candidate for a hard distribution, our chosen distribution is not obvious and is constrained by the 
fact that it must fit nicely into these recurrence relations. We prove this lower bound in Section 4. 
We end with some discussion and open problems in Section 5. 

2 Preliminaries 

In this section, we formally define the various models of query complexity and subcube partition 
complexity, and the partition bound [JK10] and public-coin partition bound [JLV14], We then 
study the relationships between these quantities. 

For the remainder of the paper, let / : {0,1}" —>• {0,1} be a Boolean function on n bits 
and x = (x\,X 2 , ■ ■ ■ ,x n ) E {0, l} n be any input. Let [n] denote the set {1,2,... ,n} and let the 
support of a probability distribution p be denoted by supp(p). Lastly, we require the notion of 
composing two Boolean functions. If / : {0,1}” —» {0,1} and g : {0, l} m —» {0,1} are two 
Boolean functions, the composed function fog : {0, l} nm —> {0,1} acts on the Boolean string 

y = (y ii, • • •, yimi 2/21, • • •, ynm ) f ° y{y) = f (27(2/11, • • •, yim), • • •, //(/Mi, • • •, ynm ))• 

2.1 Decision tree or query complexity 

The deterministic query complexity of a function /, D(f), is the minimum number of queries made 
by a deterministic algorithm that computes / correctly. 

Formally, a deterministic decision tree A on n variables is a binary tree in which each leaf is 
labeled by either a 0 or a 1, and each internal node is labeled with a value i E [n]. For every internal 
node of A, one of the two outgoing edges is labeled 0 and the other edge is labeled 1. On an input 
x, the algorithm A follows the unique path from the root to one of its leaves in the natural way: for 
an internal node labeled with the value i, it follows the outgoing edge labeled by Xi. The output 
A{x) of the algorithm A on input x is the label of the leaf of this path. We say that the decision 
tree A computes f if A{x) = f{x) for all x. 

We define the cost of algorithm A on input x. denoted by C(A,x), to be the number of bits 
queried by A on x , that is the number of internal nodes evaluated by A on x. The cost of an 
algorithm A, denoted C(A), is the worst-case cost of the algorithm over all inputs x, that is 
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C(A) = rnax x C(A,x). Now, let V n denote the set of all deterministic decision trees on n variables 
and let T>(f) C T> n be the set of all deterministic decision trees that compute /. We define the 
deterministic query complexity of f as D(f) = niin A( z V (j-j C(A). 

One of the features of deterministic query complexity that we use in this paper is its composition 
property [Monl4]. This property is very intuitive: it asserts that the best way to compute the 
composition of / and g is to use optimal algorithms for / and g independently. 

Proposition 1. For any two Boolean functions f and g, D(f o g) = D(f)D(g). 

We can now move on to randomized analogs of deterministic query complexity. In a randomized 
algorithm, the choice of the queries might also depend on some randomness. Formally, a randomized 
decision tree B on n variables is defined by a probability distribution b over T> n , that is by a function 
b : V n -A [0,1] such that Y^Aev b(A) = 1- On an input x , the algorithm B picks a deterministic 
decision tree A with probability b(A) and outputs A(x). Thus, for every x, the value B{x) of B on 
£ is a random variable. 

We say that a randomized algorithm B computes / with error e > 0 if Pr \B{x) = g{x)\ > 1 — e 
for all x , that is if Xm(x)=/(s) b(A) > 1 — s for all x. Let TZ n be the set of all randomized decision 
trees over n bits and let Tl £ {f) C TZ n be the set of all randomized decision trees that compute 
/ with error e. A randomized algorithm B then computes / with zero error if supp(6) C T>(f ), 
that is the probability distribution b is completely supported on the set of deterministic decision 
trees that compute /. A zero-error randomized algorithm, also known as a Las Vegas algorithm, 
always outputs the correct answer. The cost of a zero-error randomized algorithm B on x is 
defined as C(B,x ) = YIagv b(A)C(A,x) = E[C*(A, x)], the expected number of queries made on 
input x. The zero-error randomized query complexity of f, denoted by Ro(f), is defined as Ro(f) = 
minBe7^o(/) rnax x C{B , x). From the definition of zero-error randomized query complexity, it is clear 
that Ro(f) < D(f). The complexity Ro(f) can be of strictly smaller order of growth than D(f ): 
there exists a function / for which Ro(f) = o(D(f)), e.g., the iterated NAND-function [SW86]. 

Randomized algorithms with error e > 0 might give incorrect answer on their inputs with 
probability e. We say that a randomized algorithm is of bounded-error (sometimes called a Monte 
Carlo algorithm) if on any input x, the probabilistic output is incorrect with probability at most 
1/3. The constant 1/3 is not important and replacing it with any constant strictly between 0 and 
1/2 will only change the complexity by a constant multiplicative factor. For e > 0, the cost of an 
e-error randomized algorithm B on x is defined as C(B,x ) = rnax^ 4 gsupp (b) C(A,x), the maximum 
number of queries made on input x by an algorithm in the support of b. Note how this definition 
differs from the one given for the zero-error case. We define the e-error randomized complexity of f 
as R e (f) = m ^ n Ben e (f) max x C(B, x), and the bounded-error randomized query complexity of f as 
R(f) = Ri/sif)- Note that this definition is valid only for e > 0 and does not coincide with Ro(f) 
defined above for e = 0. Setting e = 0 in this definition simply gives us the deterministic query 
complexity D(f). Nonetheless, it is true that R(f) = 0(Ro(f)). This distinction is discussed in 
more detail in Section 5. Lastly, note that for all e > 0, we have R e (f) < D(/), and that there 
exist functions for which R(f) = o(D(f)) [SW86]. 

In order to establish lower bounds on randomized query complexity, it is useful to take a distri¬ 
butional view of randomized algorithms [Yao77], that is to consider the performance of randomized 
algorithms on a chosen distribution over inputs. Let ^ibea probability distribution over all possible 
inputs of length n, and let B be a randomized decision tree algorithm. The cost of B under (i is 
C(B,g) = X^xe{o i} n ^(x)C(A,x) = E [C(B,x)\. We define the e-error distributional complexity of 
f under g as A f(f) = min B€iz e (f) C(B,/j,). The following simple fact is the basis of many lower 
bound arguments. 
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Proposition 2. For every distribution y over {0, l} n , and for all e > 0, we have A e(f) < R £ (f). 

Proof. This follows by expanding out the definitions and using the simple inequality between ex¬ 
pectation and maximum: 

A £ (/) = min C(B,y) = min E [C(B,x)\< min max C(B, x) = R e (f)- (1) 

Bg 7ZRf) BaTZclf) Bg'R.e (/) x |—| 

2.2 Subcube partition complexity 

A subcube of the hypercube {0, l} n is a set of n-bit strings obtained by fixing the values of some 
subset of the variables. In other words, a subcube is the set of all inputs consistent with a partial 
assignment of n bits. Formally, a partial assignment on n variables is a function a : I a —> {0,1}, with 
I a Q H- Given a partial assignment o, we call S(a) = {y £ {0, l} n : yi = a(i) for all i £ I a } the 
subcube generated by a. A set S' C {0, l} n is a subcube of the hypercube {0, l} n if S = S(a ) for some 
partial assignment on n variables a. Clearly, for every subcube, there exists exactly one such a. We 
denote by Is the domain I a C [n] of a where S = S(a). For example, the set {0100, 0101, 0110, 0111} 
is a subcube of {0, l} 4 . It is generated by the partial assignment a : {1, 2} —>• {0,1}, where a(l) = 0 
and a(2) = 1. An alternative representation of a partial assignment is by an n-bit string where 
a position i takes the value a{i) if i £ I a and takes the value * otherwise. For this example, the 
subcube {0100,0101,0110,0111} is generated by the partial assignment 01 * *. Finally, another 
useful representation is in terms of a conjunction of literals, that is satisfied by all strings in the 
subcube. For example, the subcube {0100, 0101,0110, 0111} consists exactly of all Tbit strings that 
satisfy the formula Tf A X2 ■ 

The subcube partition model of computation, studied previously in [FKW02,BOH90,CKLS13], 
is a generalization of the decision tree model. A partition {Si,..., S}} of {0, l} n is a set of pairwise 
disjoint subsets of { 0 , l} n that together cover the entire hypercube, that is (J • Si = { 0 , l} n and 
Si fl Sj = 0 for i 7 ^ j. 

A deterministic subcube partition P on n variables is a partition of {0, l} n with a Boolean value 
s £ {0,1} associated to each subcube, that is P = {(Si, si), (S 2 , s 2 ),... , (Si, s^)}, where each Si is 
a subcube and {Si,..., S^} is a partition of {0,1} ?1 . If the assignment a generates S* for some i, 
we call a a generating assignment for P. For any x, we let S x denote the subcube containing x, 
that is, if x £ S*, then S x = Si. We define the value P{x) of P on x as s t . 

We say that a deterministic subcube partition P computes / if P(x) = f(x) for all x. Note that 
every deterministic decision tree algorithm A computing / induces a subcube partition computing 
/ that consists of the subcubes generated by the partial assignments defined by the root-leaf paths 
of the tree and the Boolean values of the corresponding leaves. We define the cost of P on x as 
C{P,x) = |is® |, analogous to the number of queries made on input x in query complexity. We 
define the worst-case cost as C(P) = max x C(P, x). Let be the set of all deterministic subcube 
partitions on n variables and let V sc (f) C D^ be those partitions that compute /. We define 
the deterministic subcube partition complexity of f as D sc (f ) = minp G x> sc (/) C'(P). Deterministic 
subcube partition complexity also satisfies a composition theorem. 

Proposition 3. For any f : {0,1}- -t {0,1} and g : {0, l} m {0,1}, D sc (f o g) < D sc (f)D sc (g). 

Proof. Let P = {(Si, si), (S 2 , S 2 ), ■ ■ ■, (S p , s p )} and Q = {(Ti, ti),..., (T q , t q )} be optimal deter¬ 
ministic subcube partitions computing / and g respectively. Suppose that Sh is generated by 
ah for h £ \j)\. and that Tj is generated by bj for j £ [g]. Let I ah = {«i,..., i Ch }- We de¬ 
fine the deterministic subcube partition P o Q on nrn variables as follows. The generating as¬ 
signments for P o Q are ah o (bj 1 ,... ,bj ), for all h £ [p], and ji,.--,j Ch G [q] that satisfy 
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a(ik) = tj k for k £ [ch\- When \h jk \ = <4, the assignment e = ah o (bj 1 ,... ,bj Ch ) is de¬ 
fined by I e = {(1,1),... , (1, di), (2, 1), - - - ,(ch,d Ch )}, and e(k,r) = b jk (r) for 1 < r < d k . The 
Boolean value associated with e is Sh- It is easy to check that P o Q computes fog and that 
C{PoQ) <C{P)C{Q). □ 

As in the case of query complexity, we extend deterministic subcube complexity to the ran¬ 
domized setting. A randomized subcube partition R on n variables is given by a distribution r over 
all deterministic subcube partitions on n variables. As for randomized decision trees, R(x) is a 
random variable and we say that R computes / with error e > 0 if Pr[R(x) = f(x)] > 1 — e for all 
x. Let Rff be the set of all randomized subcube partitions over n variables and 7 Z s £ c (f) C Rff be 
the set of all randomized subcube partitions that compute / with error e. 

The cost of a zero-error randomized subcube partition R on x is defined by C(R, x) = E [C(P, x)\, 
where the expectation is taken over R. For an e-error subsucbe partition R, with z > 0, the cost 
on x is C(R,x) = maxp esupp ( r ) C(P, x). For e > 0, we define the e-error randomized subcube 
complexity of f by Rf(f) = min nenf-(J) max x C(R, x). 

As mentioned before, a deterministic decision tree induces a deterministic subcube partition 
with the same cost and thus a randomized decision tree induces a randomized subcube partition 
with the same cost, which yields the following. 

Proposition 4. For an n-bit Boolean function f : {0,1}” —>• {0,1}, we have that D sc (f) < D(f ) 
and, for all e >0, we have that R s £ c (f ) < R e (f)- 

2.3 Partition bounds 

In 2010, Jain and Klauck [JK10] introduced a linear programming based lower bound technique for 
randomized query complexity called the partition bound. They showed that it subsumes all known 
general lower bound methods for randomized query complexity, including approximate polynomial 
degree [NS95], block sensitivity [Nis91], randomized certificate complexity [Aar06], and the classical 
adversary bound [LM08, SS06, Aar08]. 

Recently, Jain, Lee, and Vishnoi [JLV14] presented a modification of this method called the 
public-coin partition bound, which is easily seen to be stronger than the partition bound. Further¬ 
more, they were able to show that the gap between this new lower bound and randomized query 
complexity can be at most quadratic. We define these lower bounds formally. 

Definition 1 (Partition bound). Let / : {0, l} n — > {0,1} be an ?r-bit Boolean function and let S n 
denote the set of all subcubes of {0, l} n . Then, for any e > 0, let prt e (/) be the optimal value of 
the following linear program: 


l 


minimize: 

W S ,z 

XIX ws ’ z ■ 2 ' /sl 

(2) 

Z =0 S (E:Sn 


subject to: 

X W S,f(x) > 1 - e ( for a11 x e {O’ I-D) 

S:xGS 

1 

X X ws ’ z = 1 ( for a11 x G 1 } n )’ 

S:xGS z=0 

(3) 


(4) 


ws, z > 0 (for all S £ S n and z £ {0,1}). 

(5) 


The e-partition bound of / is defined as PRT e (/) = | log 2 (prt e (/)). 
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We now define the public-coin partition bound. Although our definition differs from the original 
definition [JLV14], it is not too hard to see that they are equivalent. Before presenting the definition, 
recall that is the set of deterministic subcube partitions on n variables, and 77| c (/) is the set 
of randomized subcube partitions that compute / with error at most e > 0. For a randomized 
subcube partition R £ 77| c (/), we let r be the probability distribution over deterministic subcube 
partitions corresponding to R. 

Definition 2 (Public-coin partition bound). Let / : {0, l} n —>■ {0,1} be an n-bit Boolean function. 
Then, for any e > 0, let pprt £ (/) be the optimal value of the following linear program: 


minimize: 

R 


EE E r ( p >- 2 ' ,sl 

«=o ses„ p-.(s,z)eP 


subject to: R £ lZ s £ c (f). 


( 6 ) 

(7) 


The e-'public-coin partition bound of f is defined as PPRT e (/) = i log 2 (pprt e (/)). 

Using the original definition, it is trivial that prt e (/) < pprt £ (/), since the public-coin partition 
bound is defined using the same linear program, with additional constraints. This statement also 
holds with the definitions given above, as we now prove. 


Proposition 5. For any Boolean function f and for all e > 0, we have that prt e (/) < pprt £ (/). 

Proof. Let R' be a randomized subcube partition achieving the optimal value for the linear pro¬ 
gram of pprt £ (/) and r' be the corresponding probability distribution over deterministic subcube 
partitions. Then, for all (S,z) where S' is a subcube and z £ {0,1}, let 


w 's,z = Y r '( p )- 

P:(S,z)£P 


( 8 ) 


This family of variables satisfies the conditions of the pprt £ (/) linear program and is such that 


l 


i 


E E ■ 2l,sl = E E E 

z=oseS n z=o ses n P-.{S,z)£P 


r\P ) • 2 |/sl . 


(9) 


□ 


Recall that both partition bounds lower bound randomized query complexity, as shown in 
[JLV14], In particular, for all e > 0, PRT e (/) < PPRT e (/) < R £ (f) and, when e = 0, we have that 
PRT 0 (/) < PPRTo(/) < D(f). It is not known if the zero-error partition bound also lower bounds 
zero-error randomized query complexity. However, as mentioned, the partition bounds also lower 
bound subcube partition complexity, which implies that they lower bound query complexity. The 
proof for query complexity easily extends to subcube partition complexity. 

Proposition 6 . For every Boolean function f and for all e > 0 , we have that PPRT £ (/) < R s e c (f ) 
and PPRT 0 (/) < D sc (f). 

Proof. Let R' £ 7£| c (/) be a randomized subcube partition that achieves R s e c (f ) and let r' be its 
corresponding probability distribution over deterministic subcube partitions. Let P £ supp(r / ). 


7 


By definition, for every (S,z) £ P , we have that \Is\ < C(P). Also by definition, C(P) < R s e c (f). 
Furthermore, if P = {(Si, z{), (S 2 , z 2 ), ..., (S m , z m )}, then 

m 

|p| . 2 n-C(P) = m . 2 n-C{P) < ^ 2 = T. (10) 

2=1 

This implies that |P| < 2 C < 2' f ^ c (/) and, therefore, that 

pprt,(/) = E E E r '< p ) - 2l ' sl s 2rtem E E E r '< p ) (n) 

z=0 SeSu P:(S,z)GP 2=0 S€5„ P:(S,z)eP 

= 2 r I c (/) /(P)-|P| < 2 R i c ^- 2 R i c W Y r'(P) (12) 

PGsupp(r') PGsupp(r') 

— 2 2R i c (f'>. (13) 

The first inequality holds since \Is\ < R s e c (f), and the second inequality uses the fact that \P\ < 
2 r ° c W. Setting e = 0 gives PPRT 0 (/) < D sc (f). □ 

The following theorem summarizes the known relations between the introduced complexity 
measures. 


Re(f) 


Rf(f) 


PPRT e(f) 


D{f) 


D sc (f ) 


PPRT 0 (/) 


RoU) 


R s o c U) 


Figure 1: Relationships between the complexity measures introduced. An arrow from X to Y 
represents X < Y. For example, D sc (f) —> D(f ) means D sc (f) < D(f). 


Theorem 2. For any Boolean function f : {0, l} n —>• {0,1} and for all e > 0, the relations indicated 
in Figure 1 hold. 

Proof. The upper three vertical arrows represent the relations between query complexity and sub¬ 
cube partition complexity established in Proposition 4. The remaining vertical arrows represent the 
relations between the public-coin partition bounds and subcube partition complexity established 
in Proposition 6 . The other inequalities are immediate and follow from their definitions. □ 

3 Iterated quaternary majority function 

We now introduce the function we use to separate randomized query complexity from subcube 
partition complexity and establish some of its properties. 







Let MAJ denote the Boolean majority function of its input bits when the number of bits is odd. 
The quaternary majority function 4-MAJ : {0,l } 4 —>■ {0,1} is defined by 4-MAJ(aq, X 2 , £ 3 , x±) = 
x\{x2 V X3 V X4) V X2X3X4. This function was introduced in [Sav02]. We call it 4-MAJ, because the 
output of the function is the majority of its input bits, with the first variable breaking equality 
in its favor. In other words, the first variable has two votes, while the others have one, that is 
4-MAJ(xi, X2, £3, X4) = MAJ(xi, xi, X2, X3, X4). This function has previously been used to separate 
deterministic decision tree size from deterministic subcube partition size [Sav02], We use this 
function because its subcube partition complexity is smaller than its query complexity. 

Proposition 7. We have D SC (4-MAJ) = 3 and D(4-MAJ) = 4. 

Proof. Observe that, for any choice of w € {0,1}, we have that 

4-MAJ(0,0,1, w) = 4-MAJ(0, w, 0,1) = 4-MAJ(0,1, 10, 0) = 4-MAJ(w, 0, 0,0) = 0 

and that 

4-MAJ(l, 1 ,0, w) = 4-MAJ(l, u),l,0) = 4-MAJ(l, 0, w, 1) = 4-MAJ(tu, 1,1,1) = 1. 

The subcubes generated by these 8 partial assignments are disjoint and of size two, forming a par¬ 
tition of {0, l} 4 . Thus, with the right Boolean values, they form a deterministic subcube partition 
that computes 4-MAJ. Since all partial assignments have length 3, D SC (4-MAJ) < 3. Although we 
do not use the inequality U SC (4-MAJ) > 3 in our results, this can be verified by enumerating all de¬ 
terministic subcube partitions with complexity 2. Furthermore, U(4-MAJ) < 4 since any function 
can be computed by querying all input bits. D(4-MAJ) > 4 can be shown either by enumerating 
all decision trees that make 3 queries or by using the lower bound in the next section. □ 

While our results only require us to show lower bounds on the randomized query complexity 
of 4-MAJ, we want to mention that the randomized query complexity of 4-MAJ is indeed smaller 
than its deterministic query complexity. 

Proposition 8 . For the 4-MAJ function, i?o(4-MAJ) < 13/4 = 3.25. 

Proof. The randomized algorithm achieving this complexity is simple: with probability 1/4, the 
algorithm queries the first variable and then it checks if the other variables all have the opposite 
value; with probability 3/4, it checks if the last three variables have all the same value and, if not, 
it queries the first variable. □ 

Since the 4-MAJ function separates deterministic subcube complexity from deterministic query 
complexity, a natural candidate for a function family that separates these measures is the iterated 
quaternary majority function, 4-MAJ^, defined recursively on A h variables, for h > 0. In the base 
case, 4-MAJo is the identity function on one bit. For h > 0, we define 4-MAJ^ = 4-MAJ o4-MAJ/j_i. 
In other words, for h > 0, let x be an input of length A h , and for i € {1, 2,3,4}, let 2 W denote the 
i th quarter of x, that is \x^\ = A h ~ l and X — X ^ ^ X^~ ^ X ^ ^X ( 4 ). Then, we have that 4-MAJ/ t (x) = 
4-MAJ(4-MAJ /l _ 1 (xW), 4-MAJ /l _ 1 (x( 2 )), 4-MAJ h _ 1 (a/ 3 )), 4-MAJ /l _ 1 (a/ 4 ))). 

The function 4-MAJ/j inherits several properties from 4-MAJ. It has low deterministic subcube 
complexity, but high deterministic query complexity: 

Proposition 9. For all h> 0, D sc ( 4-MAJ ft ) < 3 h and D( 4-MAJ&) = A h . 

Proof. For h = 0, the statement is trivial and for h = 1, the statement is Proposition 7. Proposition 1 
and Proposition 3 used recursively imply the result. □ 
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We now introduce terminology that we use to refer to this function. We view 4-MAJ^ as defined 
by the read-once formula on the complete quaternary tree of height h in which every internal 
node is a 4-MAJ gate. We identify the leaves of T/, from left to right with the integers 1,... ,4 h . 
For an input x E {0, l} 4h , the bit Xi defines the value of the leaf i. We then evaluate recursively the 
values of the internal nodes. The value of the root is 4-MAJ/ l (x). For every internal node v in T^, 
we denote its children by v\. V 2 , V 3 and V 4 , from left to right. For any node v in T/>, let Z{v) denote 
the set of variables associated with the leaves in the subtree rooted at v. We say that a node v is 
at level £ in T/, if the distance between v and the leaves is £. The root is therefore at level h. and 
the leaves are at level 0. For 0 < £ < h, the set nodes at level £ is denoted by T^fT). 

4 Randomized query complexity of 4-MAJ/, 

In this section, we prove our main technical result, a lower bound on the randomized query com¬ 
plexity of 4-MAJ/j. We prove this by using distributional complexity, that is by using the in¬ 
equality in Proposition 2. First, we define a “hard distribution” dh for which we will show that 
(4-MAJ/,) > (1 — 2e)(16/5) h , which implies our main result (Theorem 1). 

4.1 The hard distribution 

Intuitively, the distribution we use in our lower bound has to be one on which it is difficult to 
compute 4-MAJ/j. We start by defining a hard distribution for 4-MAJ and extend it to 4-MAJ^ in 
the natural way: by composing it with itself. 

The hard distribution d on inputs of length 4 is defined from d° and d 1 , the respective hard 
distributions for 0-inputs and 1-inputs of length 4, by setting d{x) = \d b (x) when 4-MAJ(x) = b. 
We define d° as 

d°(1000) = -, d°(0011) = d°(0101) = d°(0110) = 

5 6 

d°(0001) = d°(0010) = d°(0100) = and d°(0000) = 0. (14) 

The definition of d 1 is analogous, or can be defined by d l (x i,X 2 ,x%, X 4 ) = d°( 1 —xi, 1 —# 2 ,1 —# 3 ,1 — 
X 4 ). Given that the function 4-MAJ is symmetric in X 2 , X 3 , and x' 4 , there are only 4 equivalence 
classes of 0-inputs, to which we have assigned probability masses 2/5,1/2,1/10, and 0, and then 
distributed the probabilities uniformly inside each class. The probabilities were chosen to make the 
recurrence relations in Lemma 1 and Lemma 2 work, while putting more weight on the intuitively 
difficult inputs. For example x = 0000 seems like an easy input since all inputs that are Hamming 
distance 1 from it are also 0-inputs, and thus reading any 3 bits of this input is sufficient to compute 
the function. In Lemma 2 we will give an equivalent characterisation of the hard distribution which 
is more directly related to the recurrence relations in the lemmas. 

From this distribution we recursively define, for h > 0, the hard distribution dh on inputs of 
length 4 h . In the base case, do(0) = do(l) = For h > 0, as for d, the distribution dh is defined 
from d° h and d\, the respective hard distributions for 0 -inputs and 1 -inputs of length 4 h , by setting 
dh(x) = \d b h (x) when 4-MAJ(x) = b. Let x = x^x^x^x^ be a 6 -input, where x/d j s a 6 --inp U t 
of length 4* -1 , for i E {1, 2, 3,4}. Then, d b h (x) = d b {b^b^bi) ■ n 4 = 1 d^ i _ 1 (a;W). It is easily seen that 
according to dh, for each node v in T/j, if the value of v is b, then the children of v have values 
distributed according to d b . With the additional constraints that the root has uniform distribution 
over { 0 , 1 }, this actually makes an alternative definition of dh- 
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We will also require the notion of a minority path in our proof. For a given input, a minority 
path is a path from the root to a leaf in which each node has a value different from its parent’s 
value. (Recall that the value of a node is the function 4-MAJ evaluated on the values of its children.) 
For example, for the 4-MAJ function, on input 1000 the unique minority path is the edge from the 
root to the first variable, whereas on input 1001 there are two minority paths from the root to the 
second and third variable. In general, since there may be multiple such paths, the minority path 
is defined to be a random variable over all root-leaf paths. Formally, for every input x £ {0, l} 1 , 
we define the minority path M{x) as a random variable over all root-leaf paths in T/ t as follows. 
First, the root is always in M(x). Then, for any node v in M(x), if there is a unique child w of 
v with value different from that of v, then w £ M{x). Otherwise, there are exactly two children 
with different values, and we put each of them in M(x) with probability Note that with this 
definition, if x is chosen from the hard distribution dh, conditioned on the node v being in M(x), 
the first child v\ is in the minority path with probability |, and the child Vi is in the minority path 
with probability for i £ {2,3,4}. 

4.2 Complexity of 4-MAJ^ under the hard distribution 

We can now lower bound the distributional complexity of 4-MAJ^ under the hard distribution. 

Theorem 3. For all e > 0 and h > 0, we have (4-MAJ/J > (1 — 2e)(l6/5) h . 

To show this, we need to define some quantities. For a deterministic decision tree algorithm 
A computing 4-MAJ^, let La{x ) denote the set of variables queried by A on input x. Let B be a 
randomized decision tree algorithm that computes 4-MAJ^ with error e, and let b be its probability 
distribution over deterministic algorithms. For any two (not necessarily distinct) nodes of T^, u 
and v, we define the function Eb(v,u ) as E B {v,u ) = E[|Z(v) fl La(x)\\u £ M(aj)], where the 
expectation is taken over b,dh and the randomness in M(x). In words, Eb{v,u) is the expected 
number of queries below the node v over the randomness of B, the hard distribution and the 
randomness for the choice of the minority path, under the condition that u is in the minority path. 
For 0 < £ < h, we also define the functions Jg(h,£), Kg(h,£), J £ (h,£ ), and K £ (h,£) by 

= Y e b(v,v), (15) 

v£T h (£) 

K £ B (h,e)= Y ( \Y EB< < Vi 'V^) + ) ’ ( 16 ) 

veT h (i) \ i=2 ) 

J E (h,l) = min J £ B (h,l) and K £ (h,£ ) = min K £ B (h,£). (17) 

BG7?. £ (4-MAJ h ) BG7?.e(4-MAJ fe ) 

Observe that J £ (h,h) = minBg 7 e £ ( 4 _MAj fe ) E[C(A, x)] < Af' l (4-MAJ/ 1 ). 

The proof of Theorem 3 essentially follows from the following two lemmas. 

Lemma 1. For all 0 < l < h, we have that J £ (h,£) > K £ (h,£) + | J £ (h,£ — 1). 

Proof. This proof mainly involves expanding the quantity Eb(v,v ) in terms of EB(vi,Vj), where 
vi,V 2 ,vs, and are the children of v. Since, for every node v, the set of leaves below v is the 
disjoint union of the sets of leaves below its children, for every B we have that 

4 

J%(h,e)= Y Y E ^ V )- ( 18 ) 

v€T h {t) 2=1 
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By conditioning on the minority child of v, we get that 

4 4 

•4M- E EE EB(vi,Vj)Pv[vj £ M(x)|u £ M(x)] . (19) 

«eT h (0 i=i l=i 

As mentioned before, if x is chosen according to the distribution dh, if u € M(x), then v\ £ M(x) 
with probability | and Vi £ M(x ) with probability for i £ {2, 3,4}. Substituting these values we 
get 

£) = K £ B (h , £) + - J B (h , £ — 1) + -E B (yi, ui). (20) 

Discarding the last term on the right hand side, which is always non-negative, and taking the 
minimum over B for all remaining terms gives the result. □ 

Having established this, we need to relate K £ (h,£) with J £ (h — 1,£ — 1). Informally, given a 
randomized algorithm that performs well on 4-MAJ/j at depth £, we construct another algorithm 
that performs well on 4-MAJ/j_i at depth £ — 1. 

Lemma 2. For all 0 < £ < h, we have that K £ (h ,£) > 3 J £ (h — 1,£ — 1). 

Proof. For any B £ 7^ e (4-MAJ^), we will construct B' £ 7^ e (4-MAJ/ l _ 1 ) such that 

±Kf,(h,£) = Jf,,(h-1,£-1). ( 21 ) 

Taking the minimum over all B £ 7*E(4-MAJ/j) implies the statement. 

We start by giving a high level description of our construction of B' from B. First B' will 
choose a random injective mapping from {xi,... ,x 4 h- 1 } to {xi, ... ,x 4 h}, identifying each variable 
of T/j_i with some variable of T/,. Then, it will choose a random restriction for the remaining 
variables of T/j. Note that these choices are not be made uniformly. Let B r denote the algorithm 
for A h ~ l variables defined by B after the identification and the restriction according to randomness 
r. B' then simply executes B r . Our embedding of the smaller instance into the larger instance is 
done in a way that preserves the output. 

We now describe the random identification and restriction in detail. First, observe that there is 
a natural correspondence between the nodes of Th-i(£ — 1) and T h(£) (since they are of the same 
size): we simply map the ith node of Th-i(£ — 1) from the left to the ith node of T h(£) from the 
left. For every node u £ T/ t _ i (l — 1), let v £ T h(£) be its corresponding node. The algorithm 
B' makes the following independent random choices. To generate the random identification, B' 
randomly chooses a child w of v, where w = v\ with probability i, and w = Vi with probability 

for i £ {2,3,4}. Then, the variables of Z(u ) and the variables of Z(w) are identified naturally, 
again from left to right. 

For generating the random restriction, B' first generates random values for the three siblings of 
w. If w = v\, then it chooses for (u 2 ,U 3 ,U 4 ) one of the six strings from {001,010,100,110,101,011} 
uniformly at random. If w £ {^2 , ^3 , ^ 4 }, it chooses for v\, a uniformly random value from {0,1}, 
and for the remaining two siblings, it picks the opposite value. From this, the restriction is generated 
as follows: for each sibling w' of w with value b £ {0,1}, a random string of length 4 f_1 is generated 
according to . and the variables in Z(iv') receive the values of this string. This finishes the 
description of B'. 

We now show that B' £ 7£ £ (4-MAJ h-i)- Because of the identification of the variables of Z(u) 
and Z(w), for every x £ {0, l} 4 , the value of u coincides with the value of w. The random values 

chosen for w are such that whatever value w gets, it is always a majority child of v. Therefore, 
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for every input x, and for every randomness r, the value of u is the same as the value of v. This 
implies that for every x and every randomness r, the value of the roots of Th-i{£ — 1 ) and T h(() 
are the same. Since B is an algorithm which computes 4-MAJ^ with error at most e, this means 
that B r is an algorithm which computes 4-MAJ/ l _i with error at most e, for every randomness r. 
From this, it follows that B' G 7^ e (4-MAJ/ l _i). 

Finally we prove the equality in (21). For this, the main observation (which can be checked by 
direct calculation) is that when w gets a random Boolean value, the distribution of values generated 
by B' on the children of v is exactly the hard distribution d. Therefore, Eb'(u,u) = Eb(w,v). 
Consequently, we have that 

4 

- 1) = ^2 e b(w,v)= ^2 '^2E B (v i ,v)Pr[w = Vi\v £ M(x)] 

veT h (e) veT h (H) i =l 

4 4 

= ^2 ^ ^ EB(vj,Vj) Pr[u> = Vj\ Pr[uj G M(x)\w = Vj,v G M(x)\ 
veT h {£) i =i j =i 

= ±K%(h,£). ( 22 ) 

The third equality holds since the choice of w is independent from the fact that v is in the minority 
path. For the last equality, we used that the conditional probabilities evaluate to the following 
values: 


Pr[uj G M(x)\w = Vj,v G M(x)\ = 0, 
Pr [vj G M{x)\w = v\,v G M(x)\ = -, 

O 

Pr[ui G M(x)\w = Vi,v G M(x)\ = -, 
Pr[uj G M(x)\w = Vi,v G M{x)\ = -, 
We can now return to proving Theorem 3. 


for j G {1,2, 3,4}; 
for j ± 1; 

for * ^ 1 ; 

for i.j G {2, 3,4} and i ^ j. 


□ 


Proof of Theorem 3. We claim that, for all 0 < l < h, we have that 

J e (M) > (1 - 2e)(16/5) £ . (23) 

The proof is done by induction on l. For the base case t = 0, let B G 7^ e (4-MAJ/ l ). Then, we have 
that 

4(M) = E Pr[£> queries v\v G M{x)\. (24) 

^eT h (o) 

Observe that any randomized decision tree algorithm computing a nonconstant function with error 
at most e must make at least one query with probability at least 1 — 2 s, since otherwise it would 
output 0 or 1 with probability greater than e, and thus on some input would err too much. Let 
therefore A be a deterministic algorithm from the support of B which makes at least one query. 
Then 

y~~] Pr[A queries v\v G M(x)\ > ^2 Pr[^4 first query is v\v G M(x)\ = 1, (25) 

'ueW(o) «eT h (o) 

since in the summation the term corresponding to the first query of A is 1, whereas all other terms 
are 0. Thus, J{h, 0) > 1 — 2e for all h> 0. 
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Now let t > 0, and assume the statement holds for l — 1. For h > £, using Lemma 1 and 
Lemma 2, we get that J £ (h,£) > 3J £ (h — 1,£ — 1) + \j £ (h,£ — 1). Therefore, by the induction 
hypothesis, we have that 

J £ (h,£)> 3(l-2e)(^y 1 + I(l- 2 £ )^y W-2 £ )(fy. (26) 

The theorem follows when we set h = £ by noting that J £ (h , h) < A^^-MAJ^). □ 

Combining Proposition 9 and Theorem 3 gives us our main result, an asymptotic separation 
between deterministic subcube partition complexity and randomized query complexity: 

Theorem 1. There exists a function f = (fh), with fh : {0, l} 4h —> {0,1}, such that D sc (f) < 3 h , 
but D{f) = 4\ R 0 (f) > 3.2 h , and R{f) = fi(3.2 h ). 

We can also immediately deduce that the 4-MAJ/, function positively answers both Question 1 
and Question 2. 

Corollary 1. We have that Rg c (4-MAJfr) = o(i?o(4-MAJ/ t )). 

Corollary 2. For 0 < £ < 1/3, we have that PPRT^-MAJ/j) = o(i? £ (4-MAJ^)). 

5 Discussion and open problems 

Our main result is actually stronger than stated. In addition to the zero-error and £-error random¬ 
ized query complexities we defined, we can also define £-error expected randomized complexity. In 
this model, we only charge for the expected number of queries made by the randomized algorithm, 
like in the zero-error case, but we also allow the algorithm to err. Formally, the e-error expected 
randomized query complexity of f is i?| xp (/) = ma x x C(B,x)). Observe that since this 

generalizes zero-error randomized query complexity, Rf xp ( f) = Ro( f), and it is immediate that, for 
all £ > 0 , we have that Rf p {f) < Re{}) < D(f). 

Randomized query complexity is usually defined in the worst case [BdW02], that is as R £ {f) 
instead of Rf Lp (f). The main reason for not dealing with these measures separately is that worst 
case and expected randomized complexities are closely related. We have already observed that 
(obviously), in expectation, one can not make more queries than in the worst case. On the other 
hand, if for some constant r) > 0 we let the randomized algorithm that achieves R £ xp (/) make 
^i?| xp (/) queries, and give a random answer in case the computation is not finished, we get an 
algorithm of error £ + r/ which never makes more than ^R| xp (/) queries. Therefore, for all £ > 0 
and > 0, we have that R £+r/ (f ) < ^ l -R £ xp (f). 

The result we show actually lower bounds Rf cp (f) as well. Thus, a stronger version of our result 
is the following: For all e > 0, R| xp (4-MAJ/ l ) > (1 — 2e)(3.2) h . 

We end with some open problems. It would be interesting to exactly pin down the randomized 
query complexity of 4-MAJ^. For example we know that i?o(4-MAJ/i) > 3.2 h and i?o(4-MAJ/j) < 
3.25^. The best separation between subcube partition complexity and query complexity remains 
open, even in the deterministic case. For example, we know that H sc (/) < D(f) and D(f) < 
(.D sc (/)) 2 , so they are at most quadratically different. The 4-MAJ/j function shows that there exists 
a function for which D(f ) > -D sc (/) log 34 > (H SC (/)) L26 . Can this separation or the quadratic upper 
bound be improved? 

Finally it would be interesting to know if the partition bounds also lower bound expected ran¬ 
domized query complexity, and in particular whether the zero-error partition bound lower bounds 
zero-error randomized query complexity. 
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